Entropy-based Scheduling Policy for Cross Aggregate Ranking Workloads
نویسندگان
چکیده
Many data exploration applications require the ability to identify the top-k results according to a scoring function. We study a class of top-k ranking problems where top-k candidates in a dataset are scored with the assistance of another set. We call this class of workloads cross aggregate ranking. Example computation problems include evaluating the Hausdorff distance between two datasets, finding the medoid or radius within one dataset, and finding the closest or farthest pair between two datasets. In this paper, we propose a parallel and distributed solution to process cross aggregate ranking workloads. Our solution subdivides the aggregate score computation of each candidate into tasks while constantly maintains the tentative top-k results as an uncertain top-k result set. The crux of our proposed approach lies in our entropy-based scheduling technique to determine result-yielding tasks based on their abilities to reduce the uncertainty of the tentative result set. Experimental results show that our proposed approach consistently outperforms the best existing one in two different types of cross aggregate rank workloads using real datasets.
منابع مشابه
Improving Search-based Parallel Job Scheduler
To balance performance goals and allow administrators to declaratively specify high-level objective, we have proposed a goal-oriented scheduling framework by designing an objective model and a scheduling policy based on combinatorial search techniques to achieve the objective. In this work, we further evaluate our new policy on various real workloads including (1) ten monthly workloads that ran...
متن کاملEcological Efficiency Based Ranking of Cities: A Combined DEA Cross-Efficiency and Shannon’s Entropy Method
In this paper, a method is proposed to calculate a comprehensive index that calculates the ecological efficiency of a city by combining together the measurements provided by some Data Envelopment Analysis (DEA) cross-efficiency models using the Shannon’s entropy index. The DEA models include non-discretionary uncontrollable inputs, desirable and undesirable outputs. The method is implemented to...
متن کاملScheduling Multiple Data Visualization Query Workloads on a Shared Memory Machine
Query scheduling plays an important role when systems are faced with limited resources and high workloads. It becomes even more relevant for servers applying multiple query optimization techniques to batches of queries, in which portions of datasets as well as intermediate results are maintained in memory to speed up query evaluation. In this work, we present a dynamic query scheduling model ba...
متن کاملOptimum Aggregate Inventory for Scheduling Multi-product Single Machine System with Zero Setup Time
In this paper we adopt the common cycle approach to economic lot scheduling problem and minimize the maximum aggregate inventory. We allow the occurrence of the idle times between any two consecutive products and consider limited capital for investment in inventory. We assume the setup times are negligible. To achieve the optimal investment in inventory we first find the idle times which minimi...
متن کاملCycle Time Optimization of Processes Using an Entropy-Based Learning for Task Allocation
Cycle time optimization could be one of the great challenges in business process management. Although there is much research on this subject, task similarities have been paid little attention. In this paper, a new approach is proposed to optimize cycle time by minimizing entropy of work lists in resource allocation while keeping workloads balanced. The idea of the entropy of work lists comes fr...
متن کامل